Bilingual Distributed Phrase Representations for Statistical Machine Translation

نویسندگان

  • Peyman Passban
  • Chris Hokamp
  • Qun Liu
چکیده

Phrase–based machine translation (PBMT) relies upon the phrase-table as the main resource for bilingual knowledge at decoding time. A phrase table in its basic form includes aligned phrases along with four probabilities indicating aspects of the co-occurrence statistics for each phrase pair. In this paper we add a new semantic similarity score as a statistical feature to enrich the phrase table. The new feature is inferred from a bilingual corpus by a neural network (NN), and estimates the semantic similarity of each source and target phrase pair. We observe a significant increase in system performance with the addition of the new feature. We evaluated our model on the English–French (En–Fr) and English–Farsi (En–Fa) language pairs. Experimental results show improvements for all translation directions of En↔Fr and En↔Fa.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Bilingual Distributed Phrase Representations for Statistical Machine Translation

Following the idea of using distributed semantic representations to facilitate the computation of semantic similarity between translation equivalents, we propose a novel framework to learn bilingual distributed phrase representations for machine translation. We first induce vector representations for words in the source and target language respectively, in their own semantic space. These word v...

متن کامل

Transduction Recursive Auto-Associative Memory: Learning Bilingual Compositional Distributed Vector Representations of Inversion Transduction Grammars

We introduce TRAAM, or Transduction RAAM, a fully bilingual generalization of Pollack’s (1990) monolingual Recursive Auto-Associative Memory neural network model, in which each distributed vector represents a bilingual constituent—i.e., an instance of a transduction rule, which specifies a relation between two monolingual constituents and how their subconstituents should be permuted. Bilingual ...

متن کامل

Learning Bilingual Phrase Representations with Recurrent Neural Networks

We introduce a novel method for bilingual phrase representation with Recurrent Neural Networks (RNNs), which transforms a sequence of word feature vectors into a fixed-length phrase vector across two languages. Our method measures the difference between the vectors of sourceand target-side phrases, and can be used to predict the semantic equivalence of source and target word sequences in the ph...

متن کامل

Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation

Learning semantic representations and tree structures of bilingual phrases is beneficial for statistical machine translation. In this paper, we propose a new neural network model called Bilingual Correspondence Recursive Autoencoder (BCorrRAE) to model bilingual phrases in translation. We incorporate word alignments into BCorrRAE to allow it freely access bilingual constraints at different leve...

متن کامل

Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model

We investigate how to improve bilingual embedding which has been successfully used as a feature in phrase-based statistical machine translation (SMT). Despite bilingual embedding’s success, the contextual information, which is of critical importance to translation quality, was ignored in previous work. To employ the contextual information, we propose a simple and memory-efficient model for lear...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015